{
 "cells": [
  {
   "cell_type": "raw",
   "id": "df29b30a-fd27-4e08-8269-870df5631f9e",
   "metadata": {},
   "source": [
    "---\n",
    "title: Extraction\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b84edb4e",
   "metadata": {},
   "source": [
    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/extraction.ipynb)\n",
    "\n",
    "## Use case\n",
    "\n",
    "LLMs can be used to generate text that is structured according to a specific schema. This can be useful in a number of scenarios, including:\n",
    "\n",
    "- Extracting a structured row to insert into a database \n",
    "- Extracting API parameters\n",
    "- Extracting different parts of a user query (e.g., for semantic vs keyword search)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "178dbc59",
   "metadata": {},
   "source": [
    "![Image description](../../static/img/extraction.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97f474d4",
   "metadata": {},
   "source": [
    "## Overview \n",
    "\n",
    "There are two broad approaches for this:\n",
    "\n",
    "- `Tools and JSON mode`: Some LLMs specifically support structured output generation in certain contexts. Examples include OpenAI's [function and tool calling](https://platform.openai.com/docs/guides/function-calling) or [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).\n",
    "\n",
    "- `Parsing`: LLMs can often be instructed to output their response in a dseired format. [Output parsers](/docs/modules/model_io/output_parsers/) will parse text generations into a structured form.\n",
    "\n",
    "Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).\n",
    "\n",
    "Functions and tools can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbea06b5-66b6-4958-936d-23212061e4c8",
   "metadata": {},
   "source": [
    "## Option 1: Leveraging tools and JSON mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25d89f21",
   "metadata": {},
   "source": [
    "### Quickstart\n",
    "\n",
    "`create_structured_output_runnable` will create Runnables to support structured data extraction via OpenAI tool use and JSON mode.\n",
    "\n",
    "The desired output schema can be expressed either via a Pydantic model or a Python dict representing valid [JsonSchema](https://json-schema.org/).\n",
    "\n",
    "This function supports three modes for structured data extraction:\n",
    "- `\"openai-functions\"` will define OpenAI functions and bind them to the given LLM;\n",
    "- `\"openai-tools\"` will define OpenAI tools and bind them to the given LLM;\n",
    "- `\"openai-json\"` will bind `response_format={\"type\": \"json_object\"}` to the given LLM.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f5ec7a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install langchain langchain-openai \n",
    "\n",
    "# Set env var OPENAI_API_KEY or load from a .env file:\n",
    "# import dotenv\n",
    "# dotenv.load_dotenv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4c2bc413-eacd-44bd-9fcb-bbbe1f97ca6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Optional\n",
    "\n",
    "from langchain.chains import create_structured_output_runnable\n",
    "from langchain_core.pydantic_v1 import BaseModel\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "\n",
    "class Person(BaseModel):\n",
    "    person_name: str\n",
    "    person_height: int\n",
    "    person_hair_color: str\n",
    "    dog_breed: Optional[str]\n",
    "    dog_name: Optional[str]\n",
    "\n",
    "\n",
    "llm = ChatOpenAI(model=\"gpt-4-0125-preview\", temperature=0)\n",
    "runnable = create_structured_output_runnable(Person, llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "de8c9d7b-bb7b-45bc-9794-a355ed0d1508",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Alex is 5 feet tall and has blond hair.\"\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02fd21ff-27a8-4890-bb18-fc852cafb18a",
   "metadata": {},
   "source": [
    "### Specifying schemas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5a74f3e-92aa-4ac7-96f2-ea89b8740ba8",
   "metadata": {},
   "source": [
    "A convenient way to express desired output schemas is via Pydantic. The above example specified the desired output schema via `Person`, a Pydantic model. Such schemas can be easily combined together to generate richer output formats:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c1c8fe71-0ae4-466a-b32f-001c59b62bb3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Sequence\n",
    "\n",
    "\n",
    "class People(BaseModel):\n",
    "    \"\"\"Identifying information about all people in a text.\"\"\"\n",
    "\n",
    "    people: Sequence[Person]\n",
    "\n",
    "\n",
    "runnable = create_structured_output_runnable(People, llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c5aa9e43-9202-4b2d-a767-e596296b3a81",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry')])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
    "Claudia is 1 feet taller Alex and jumps higher than him.\n",
    "Claudia is a brunette and has a beagle named Harry.\"\"\"\n",
    "\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53e316ea-b74a-4512-a9ab-c5d01ff583fe",
   "metadata": {},
   "source": [
    "Note that `dog_breed` and `dog_name` are optional attributes, such that here they are extracted for Claudia and not for Alex.\n",
    "\n",
    "One can also specify the desired output format with a Python dict representing valid JsonSchema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3e017ba0",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"type\": \"object\",\n",
    "    \"properties\": {\n",
    "        \"name\": {\"type\": \"string\"},\n",
    "        \"height\": {\"type\": \"integer\"},\n",
    "        \"hair_color\": {\"type\": \"string\"},\n",
    "    },\n",
    "    \"required\": [\"name\", \"height\"],\n",
    "}\n",
    "\n",
    "runnable = create_structured_output_runnable(schema, llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "fb525991-643d-4d47-9111-a3d4364c03d7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'Alex', 'height': 60}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Alex is 5 feet tall. I don't know his hair color.\"\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a3d3f0d2-c9d4-4ab8-9a5a-1ddda62db6ec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'Alex', 'height': 60, 'hair_color': 'blond'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Alex is 5 feet tall. He is blond.\"\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34f3b958",
   "metadata": {},
   "source": [
    "#### Extra information\n",
    "\n",
    "Runnables constructed via `create_structured_output_runnable` generally are capable of semantic extraction, such that they can populate information that is not explicitly enumerated in the schema.\n",
    "\n",
    "Suppose we want unspecified additional information about dogs. \n",
    "\n",
    "We can use add a placeholder for unstructured extraction, `dog_extra_info`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0ed3b5e6-a7f3-453e-be61-d94fc665c16b",
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
    "Claudia is 1 feet taller Alex and jumps higher than him.\n",
    "Claudia is a brunette and has a beagle named Harry.\n",
    "Harry likes to play with other dogs and can always be found\n",
    "playing with Milo, a border collie that lives close by.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "be07928a-8022-4963-a15e-eb3097beef9f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "People(people=[Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None), Person(person_name='Claudia', person_height=72, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry', dog_extra_info='likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.')])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class Person(BaseModel):\n",
    "    person_name: str\n",
    "    person_height: int\n",
    "    person_hair_color: str\n",
    "    dog_breed: Optional[str]\n",
    "    dog_name: Optional[str]\n",
    "    dog_extra_info: Optional[str]\n",
    "\n",
    "\n",
    "class People(BaseModel):\n",
    "    \"\"\"Identifying information about all people in a text.\"\"\"\n",
    "\n",
    "    people: Sequence[Person]\n",
    "\n",
    "\n",
    "runnable = create_structured_output_runnable(People, llm)\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a949c60",
   "metadata": {},
   "source": [
    "This gives us additional information about the dogs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97ed9f5e-33be-4667-aa82-af49cc874e1d",
   "metadata": {},
   "source": [
    "### Specifying extraction mode\n",
    "\n",
    "`create_structured_output_runnable` supports varying implementations of the underlying extraction under the hood, which are configured via the `mode` parameter. This parameter can be one of `\"openai-functions\"`, `\"openai-tools\"`, or `\"openai-json\"`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8e0b00-d6e6-432d-b9b0-8d0a3c0c6572",
   "metadata": {},
   "source": [
    "#### OpenAI Functions and Tools"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07ccdbb1-cbe5-45af-87e4-dde42baee5eb",
   "metadata": {},
   "source": [
    "Some LLMs are fine-tuned to support the invocation of functions or tools. If they are given an input schema for a tool and recognize an occasion to use it, they may emit JSON output conforming to that schema. We can leverage this to drive structured data extraction from natural language.\n",
    "\n",
    "OpenAI originally released this via a [`functions` parameter in its chat completions API](https://openai.com/blog/function-calling-and-other-api-updates). This has since been deprecated in favor of a [`tools` parameter](https://platform.openai.com/docs/guides/function-calling), which can include (multiple) functions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6b02442-2884-4b45-a5a0-4fdac729fdb3",
   "metadata": {},
   "source": [
    "Using OpenAI Functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "7b1c2266-b04b-4a23-83a9-da3cd2f88137",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "runnable = create_structured_output_runnable(Person, llm, mode=\"openai-functions\")\n",
    "\n",
    "inp = \"Alex is 5 feet tall and has blond hair.\"\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c07427b-a582-4489-a486-4c24a6c3165f",
   "metadata": {},
   "source": [
    "Using OpenAI Tools:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0b1ca93a-ffd9-4d37-8baa-377757405357",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='Alex', person_height=152, person_hair_color='blond', dog_breed=None, dog_name=None)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "runnable = create_structured_output_runnable(Person, llm, mode=\"openai-tools\")\n",
    "\n",
    "runnable.invoke(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4018a8fc-1799-4c9d-b655-a66f618204b3",
   "metadata": {},
   "source": [
    "The corresponding [LangSmith trace](https://smith.langchain.com/public/04cc37a7-7a1c-4bae-b972-1cb1a642568c/r) illustrates the tool call that generated our structured output.\n",
    "\n",
    "![Image description](../../static/img/extraction_trace_tool.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb2662d5-9492-4acc-935b-eb8fccebbe0f",
   "metadata": {},
   "source": [
    "#### JSON Mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0fd98ba-c887-4c30-8c9e-896ae90ac56a",
   "metadata": {},
   "source": [
    "Some LLMs support generating JSON more generally. OpenAI implements this via a [`response_format` parameter](https://platform.openai.com/docs/guides/text-generation/json-mode) in its chat completions API.\n",
    "\n",
    "Note that this method may require explicit prompting (e.g., OpenAI requires that input messages contain the word \"json\" in some form when using this parameter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6b3e4679-eadc-42c8-b882-92a600083f2f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "system_prompt = \"\"\"You extract information in structured JSON formats.\n",
    "\n",
    "Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
    "\n",
    "{output_schema}\"\"\"\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\"system\", system_prompt),\n",
    "        (\"human\", \"{input}\"),\n",
    "    ]\n",
    ")\n",
    "runnable = create_structured_output_runnable(\n",
    "    Person,\n",
    "    llm,\n",
    "    mode=\"openai-json\",\n",
    "    prompt=prompt,\n",
    "    enforce_function_usage=False,\n",
    ")\n",
    "\n",
    "runnable.invoke({\"input\": inp})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b22d8262-a9b8-415c-a142-d0ee4db7ec2b",
   "metadata": {},
   "source": [
    "### Few-shot examples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a01c75f6-99d7-4d7b-a58f-b0ea7e8f338a",
   "metadata": {},
   "source": [
    "Suppose we want to tune the behavior of our extractor. There are a few options available. For example, if we want to redact names but retain other information, we could adjust the system prompt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "c5d16ad6-824e-434a-906a-d94e78259d4f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='REDACTED', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "system_prompt = \"\"\"You extract information in structured JSON formats.\n",
    "\n",
    "Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
    "\n",
    "{output_schema}\n",
    "\n",
    "Redact all names.\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [(\"system\", system_prompt), (\"human\", \"{input}\")]\n",
    ")\n",
    "runnable = create_structured_output_runnable(\n",
    "    Person,\n",
    "    llm,\n",
    "    mode=\"openai-json\",\n",
    "    prompt=prompt,\n",
    "    enforce_function_usage=False,\n",
    ")\n",
    "\n",
    "runnable.invoke({\"input\": inp})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be611688-1224-4d5a-9e34-a158b3c04296",
   "metadata": {},
   "source": [
    "Few-shot examples are another, effective way to illustrate intended behavior. For instance, if we want to redact names with a specific character string, a one-shot example will convey this. We can use a `FewShotChatMessagePromptTemplate` to easily accommodate both a fixed set of examples as well as the dynamic selection of examples based on the input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "0aeee951-7f73-4e24-9033-c81a08af14dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Person(person_name='#####', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.prompts import FewShotChatMessagePromptTemplate\n",
    "\n",
    "examples = [\n",
    "    {\n",
    "        \"input\": \"Samus is 6 ft tall and blonde.\",\n",
    "        \"output\": Person(\n",
    "            person_name=\"######\",\n",
    "            person_height=5,\n",
    "            person_hair_color=\"blonde\",\n",
    "        ).dict(),\n",
    "    }\n",
    "]\n",
    "\n",
    "example_prompt = ChatPromptTemplate.from_messages(\n",
    "    [(\"human\", \"{input}\"), (\"ai\", \"{output}\")]\n",
    ")\n",
    "few_shot_prompt = FewShotChatMessagePromptTemplate(\n",
    "    examples=examples,\n",
    "    example_prompt=example_prompt,\n",
    ")\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [(\"system\", system_prompt), few_shot_prompt, (\"human\", \"{input}\")]\n",
    ")\n",
    "runnable = create_structured_output_runnable(\n",
    "    Person,\n",
    "    llm,\n",
    "    mode=\"openai-json\",\n",
    "    prompt=prompt,\n",
    "    enforce_function_usage=False,\n",
    ")\n",
    "\n",
    "runnable.invoke({\"input\": inp})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51846211-e86b-4807-9348-eb263999f7f7",
   "metadata": {},
   "source": [
    "Here, the [LangSmith trace](https://smith.langchain.com/public/6fe5e694-9c04-48f7-83ff-e541da764781/r) for the chat model call shows how the one-shot example is formatted into the prompt.\n",
    "\n",
    "![Image description](../../static/img/extraction_trace_few_shot.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbd9f121",
   "metadata": {},
   "source": [
    "## Option 2: Parsing\n",
    "\n",
    "[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
    "\n",
    "As shown above, they are used to parse the output of the runnable created by `create_structured_output_runnable`.\n",
    "\n",
    "They can also be used more generally, if a LLM is instructed to emit its output in a certain format. Parsers include convenience methods for generating formatting instructions for use in prompts.\n",
    "\n",
    "Below we implement an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "64650362",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Optional, Sequence\n",
    "\n",
    "from langchain.output_parsers import PydanticOutputParser\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "\n",
    "class Person(BaseModel):\n",
    "    person_name: str\n",
    "    person_height: int\n",
    "    person_hair_color: str\n",
    "    dog_breed: Optional[str]\n",
    "    dog_name: Optional[str]\n",
    "\n",
    "\n",
    "class People(BaseModel):\n",
    "    \"\"\"Identifying information about all people in a text.\"\"\"\n",
    "\n",
    "    people: Sequence[Person]\n",
    "\n",
    "\n",
    "# Run\n",
    "query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blond.\"\"\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = PydanticOutputParser(pydantic_object=People)\n",
    "\n",
    "# Prompt\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "# Run\n",
    "_input = prompt.format_prompt(query=query)\n",
    "model = ChatOpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "727f3bf2-31b1-4b07-94f5-9568acf3ffdf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output = model.invoke(_input.to_string())\n",
    "\n",
    "parser.parse(output.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "826899df",
   "metadata": {},
   "source": [
    "We can see from the [LangSmith trace](https://smith.langchain.com/public/aec42dd3-d471-4d34-801b-20dd88444931/r) that we get the same output as above.\n",
    "\n",
    "![Image description](../../static/img/extraction_trace_parsing.png)\n",
    "\n",
    "We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "837c350e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Joke(setup=\"Why couldn't the bicycle find its way home?\", punchline='Because it lost its bearings!')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define your desired data structure.\n",
    "class Joke(BaseModel):\n",
    "    setup: str = Field(description=\"question to set up a joke\")\n",
    "    punchline: str = Field(description=\"answer to resolve the joke\")\n",
    "\n",
    "    # You can add custom validation logic easily with Pydantic.\n",
    "    @validator(\"setup\")\n",
    "    def question_ends_with_question_mark(cls, field):\n",
    "        if field[-1] != \"?\":\n",
    "            raise ValueError(\"Badly formed question!\")\n",
    "        return field\n",
    "\n",
    "\n",
    "# And a query intended to prompt a language model to populate the data structure.\n",
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = PydanticOutputParser(pydantic_object=Joke)\n",
    "\n",
    "# Prompt\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "# Run\n",
    "_input = prompt.format_prompt(query=joke_query)\n",
    "model = ChatOpenAI(temperature=0)\n",
    "output = model.invoke(_input.to_string())\n",
    "parser.parse(output.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3601bde",
   "metadata": {},
   "source": [
    "As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.\n",
    "\n",
    "We can look at the [LangSmith trace](https://smith.langchain.com/public/557ad630-af35-43e9-b043-93800539025f/r) to see exactly what is going on under the hood.\n",
    "\n",
    "### Going deeper\n",
    "\n",
    "* The [output parser](/docs/modules/model_io/output_parsers/) documentation includes various parser examples for specific types (e.g., lists, datetime, enum, etc).  \n",
    "* The experimental [Anthropic function calling](https://python.langchain.com/docs/integrations/chat/anthropic_functions) support provides similar functionality to Anthropic chat models.\n",
    "* [LlamaCPP](https://python.langchain.com/docs/integrations/llms/llamacpp#grammars) natively supports constrained decoding using custom grammars, making it easy to output structured content using local LLMs \n",
    "* [JSONFormer](/docs/integrations/llms/jsonformer_experimental) offers another way for structured decoding of a subset of the JSON Schema.\n",
    "* [Kor](https://eyurtsev.github.io/kor/) is another library for extraction where schema and examples can be provided to the LLM."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aab95ecf",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
